13 research outputs found

    Deep visible and thermal image fusion for enhanced pedestrian visibility

    Get PDF
    Reliable vision in challenging illumination conditions is one of the crucial requirements of future autonomous automotive systems. In the last decade, thermal cameras have become more easily accessible to a larger number of researchers. This has resulted in numerous studies which confirmed the benefits of the thermal cameras in limited visibility conditions. In this paper, we propose a learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions. The goal is to create natural, intuitive images that would be more informative than a regular RGB camera to a human driver in challenging visibility conditions. The main novelty of this paper is the idea to rely on two types of objective functions for optimization: a similarity metric between the RGB input and the fused output to achieve natural image appearance; and an auxiliary pedestrian detection error to help defining relevant features of the human appearance and blending them into the output. We train a convolutional neural network using image samples from variable conditions (day and night) so that the network learns the appearance of humans in the different modalities and creates more robust results applicable in realistic situations. Our experiments show that the visibility of pedestrians is noticeably improved especially in dark regions and at night. Compared to existing methods we can better learn context and define fusion rules that focus on the pedestrian appearance, while that is not guaranteed with methods that focus on low-level image quality metrics

    Efficient training procedures for multi-spectral demosaicing

    Get PDF
    The simultaneous acquisition of multi-spectral images on a single sensor can be efficiently performed by single shot capture using a mutli-spectral filter array. This paper focused on the demosaicing of color and near-infrared bands and relied on a convolutional neural network (CNN). To train the deep learning model robustly and accurately, it is necessary to provide enough training data, with sufficient variability. We focused on the design of an efficient training procedure by discovering an optimal training dataset. We propose two data selection strategies, motivated by slightly different concepts. The general term that will be used for the proposed models trained using data selection is data selection-based multi-spectral demosaicing (DSMD). The first idea is clustering-based data selection (DSMD-C), with the goal to discover a representative subset with a high variance so as to train a robust model. The second is an adaptive-based data selection (DSMD-A), a self-guided approach that selects new data based on the current model accuracy. We performed a controlled experimental evaluation of the proposed training strategies and the results show that a careful selection of data does benefit the speed and accuracy of training. We are still able to achieve high reconstruction accuracy with a lightweight model

    The effect of the color filter array layout choice on state-of-the-art demosaicing

    Get PDF
    Interpolation from a Color Filter Array (CFA) is the most common method for obtaining full color image data. Its success relies on the smart combination of a CFA and a demosaicing algorithm. Demosaicing on the one hand has been extensively studied. Algorithmic development in the past 20 years ranges from simple linear interpolation to modern neural-network-based (NN) approaches that encode the prior knowledge of millions of training images to fill in missing data in an inconspicious way. CFA design, on the other hand, is less well studied, although still recognized to strongly impact demosaicing performance. This is because demosaicing algorithms are typically limited to one particular CFA pattern, impeding straightforward CFA comparison. This is starting to change with newer classes of demosaicing that may be considered generic or CFA-agnostic. In this study, by comparing performance of two state-of-the-art generic algorithms, we evaluate the potential of modern CFA-demosaicing. We test the hypothesis that, with the increasing power of NN-based demosaicing, the influence of optimal CFA design on system performance decreases. This hypothesis is supported with the experimental results. Such a finding would herald the possibility of relaxing CFA requirements, providing more freedom in the CFA design choice and producing high-quality cameras

    Weakly supervised deep learning method for vulnerable road user detection in FMCW radar

    Get PDF
    Millimeter-wave radar is currently the most effective automotive sensor capable of all-weather perception. In order to detect Vulnerable Road Users (VRUs) in cluttered radar data, it is necessary to model the time-frequency signal patterns of human motion, i.e. the micro-Doppler signature. In this paper we propose a spatio-temporal Convolutional Neural Network (CNN) capable of detecting VRUs in cluttered radar data. The main contribution is a weakly supervised training method which uses abundant, automatically generated labels from camera and lidar for training the model. The input to the network is a tensor of temporally concatenated range-azimuth-Doppler arrays, while the ground truth is an occupancy grid formed by objects detected jointly in-camera images and lidar. Lidar provides accurate ranging ground truth, while camera information helps distinguish between VRUs and background. Experimental evaluation shows that the CNN model has superior detection performance compared to classical techniques. Moreover, the model trained with imperfect, weak supervision labels outperforms the one trained with a limited number of perfect, hand-annotated labels. Finally, the proposed method has excellent scalability due to the low cost of automatic annotation

    RGB-NIR demosaicing using deep residual U-Net

    No full text
    Multi-spectral image acquisition brings numerous potential benefits in computer vision and image processing applications. Single-sensor acquisition helps to overcome problems with misalignments occurring in multiple-sensor acquisition. However, the single-sensor approach poses the problem of interpolation of missing values. In this paper we propose an adapted version of a residual U-Net, with application in demosaicing. The experiments show that the proposed method achieves state-of-the-art results, and has good generalization capabilities to different color filter array patterns

    HDR video synthesis for vision systems in dynamic scenes

    No full text
    High dynamic range (HDR) image generation from a number of differently exposed low dynamic range (LDR) images has been extensively explored in the past few decades, and as a result of these efforts a large number of HDR synthesis methods have been proposed. Since HDR images are synthesized by combining well-exposed regions of the input images, one of the main challenges is dealing with camera or object motion. In this paper we propose a method for the synthesis of HDR video from a single camera using multiple, differently exposed video frames, with circularly alternating exposure times. One of the potential applications of the system is in driver assistance systems and autonomous vehicles, involving significant camera and object movement, non- uniform and temporally varying illumination, and the requirement of real-time performance. To achieve these goals simultaneously, we propose a HDR synthesis approach based on weighted averaging of aligned radiance maps. The computational complexity of high-quality optical flow methods for motion compensation is still pro- hibitively high for real-time applications. Instead, we rely on more efficient global projective transformations to solve camera movement, while moving objects are detected by thresholding the differences between the trans- formed and brightness adapted images in the set. To attain temporal consistency of the camera motion in the consecutive HDR frames, the parameters of the perspective transformation are stabilized over time by means of computationally efficient temporal filtering. We evaluated our results on several reference HDR videos, on synthetic scenes, and using 14-bit raw images taken with a standard camera

    Multi-focus image fusion based on edge-preserving filters

    No full text
    To break the limitation of camera imaging and acquire abundant information with multi-focus images, we present a novel multi-focus image fusion method based on edge-preserving filters. In this paper, the focusing level is measured by two cost functions and the focus map is constructed based on the winner- take-all manner. Besides, we demonstrate that the guided image filter (GIF) and the fast global smoother (FGS) have different advantages in image structure transferring and image smoothing, which can be utilized to construct the precise fusion weight maps. Combing with the weight maps acquired by GIF and FGS, the accurate all-in-focus image is obtained using a secondary fusion strategy. Experimental results show that the proposed method is competitive or even outperforms many state-of-the-art methods, which includes the recent CNN-based fusion method, while the proposed method is less time-consuming

    Low-Complexity Deep HDR Fusion and Tone Mapping for Urban Traffic Scenes

    No full text
    In this paper we propose a computationally efficient neural network for high dynamic range fusion and tone mapping, for application in perception systems of autonomous vehicles. The proposed approach fuses two consecutive, differently exposed images into a single output with good exposure in all regions, in a standard dynamic range. Motion is compensated based on fast optical flow estimation, and subsequently by including an error mask as an input to the network to indicate the remaining artifact-prone regions. This is an efficient way for the network to learn to reduce the ghosting artifacts without increasing computational complexity. Unlike the conventional approach, we train the network on versatile traffic data, and evaluate the performance based on object detection quality metrics, rather than for visual quality. The performance was compared to a similarly complex representative method from literature. We achieved improved performance in challenging light conditions due to the robustness of our method in variable traffic conditions

    Automatic labeling of vulnerable road users in multi-sensor data

    No full text
    A growing interest in technologies for autonomous driving emphasizes the demand for safe and reliable perception systems in various driving conditions. The current state-of-the- art perception solutions rely on data-driven machine learning approaches, and require large amounts of annotated data to train accurate models. In this study we have identified limitations in the existing radar-based traffic datasets, and propose a richer, annotated raw radar dataset. The proposed solution is a semi-automatic data labeling tool, which generates an initial set of candidate annotations using state-of-the-art automatic object recognition algorithms, and requires only minimal manual intervention. In the first qualitative evaluation ever for automotive radar datasets we measure the quality of automatically computed labels under various light conditions, occlusion, behavior and modeling bias based on a multitude of tracking metrics. We determined the specific cases where automatic labeling is sufficient and where a human annotator needs to inspect and manually correct errors made by the algorithms
    corecore